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ABSTRACT 


This work presents a GPU (Graphical Processing Unit) accelerated spatial domain oriented face resolution enhancement algo¬ 
rithm based on the homogeneity levels and relative-ratios of the pixels with respect to its surrounding pixels. The algorithm has 
been developed, implemented as well as tested in the MATLAB environment. MATLAB is slow in processing but at the same 
time a resourceful environment for the development in the area of image processing owing to its extremely rich set of functions 
and programmer-friendly integrated development environment. However, to compensate for the speed loss in testing and imple¬ 
mentation phase, we have made use of GPU computing i.e. done parallelization of the algorithm on NVIDIA GPU using CUDA 
(Compute Unified Device Architecture) interface in the MATLAB environment. It is a simple but efficient algorithm in which kernel 
matrices are created encoding the homogeneity levels and relative-ratios of all pixels in surrounding four quadrants. Kernel 
matrices are subsequently applied to reconstruct the HR (High-Resolution) version from input LR (Low-Resolution) facial image. 
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INTRODUCTION 

The processing of the facial images is an essential task in the 
smart video surveillance systems. The resolution of the facial 
images from surveillance videos is usually very less owing 
to the various factors such as hardware constraints, distance 
between the subjects and camera. Therefore enhancement 
concerning resolution is a major step for the purpose of over¬ 
all improvement in the task of face detection and recogni¬ 
tion. As the task of video processing demands of fast compu¬ 
tational processing, the parallelization of proposed algorithm 
has been done on the NVIDIA Graphical Processing Unit 
(GPU) in CUDA (Compute Unified Device Architecture) 
under MATLAB environment. 

The facial image processing systems can be classified based 
on different attributes, such as the number of input facial im¬ 
ages, face angle, quality and source of images etc. Some of 
the techniques use single input LR (Low-Resolution) image, 
while some use multiple LR images which are captured gen¬ 


erally from consecutive frames of video. The latter technique 
reconstructs the HR image from multiple LR images, there¬ 
fore called as reconstruction based techniques. The other type 
of techniques which have attracted large amount of research 
work in past one decade are the learning based techniques. 
The relationship between single or multiple LR images and 
corresponding HR image is learned via a machine learning 
or intelligent technique. Further, the optimization techniques 
may be applied for tuning of weights to improve the results. 
Most of the learning based methods are based on the single 
facial image as input and use the trained system or diction¬ 
ary to estimate the missing HR details. Reconstruction based 
models are based on the generalized smoothness priors while 
the learning based techniques use recognition based priors. 

Based upon the region of interest, the human facial image 
processing systems can be classified in two major categories. 
The methods which use the positions of facial landmarks 
such as left eye, right eye, nose tip and other facial regions 
are called as local methods (Yang et al. (2010), Liu and Yang 
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(2014) ), while another type of technique is based on tem¬ 
plate matching, called as global methods. Supposedly the 
local methods require the input facial images of reasonable 
high quality so that facial landmarks can be extracted, which 
also depends on the facial image resolution. It is easier to 
localize the facial landmarks in the high-resolution facial im¬ 
ages as compared to the low-resoultion facial image, which 
necessitates the task of face resolution enhancement. 

The development techniques for image processing have cer¬ 
tain barriers such as the computational load of handling huge 
amount of graphics data of images which may involve com¬ 
putation on pixels or their extracted features. At the same 
time, support from the development environment largely 
affects the development time. MATLAB is an ideal envi¬ 
ronment for doing research in the area of image processing 
because of rich set of functions available in its Image Pro¬ 
cessing, Computer Vision and related toolboxes. However, 
MATLAB suffers severely in computational performance 
as applications written in MATLAB do not run directly on 
CPU or operating system but the MATLAB engine. On the 
other hand, developing CUDA (Compute Unified Device 
Architecture) kernels to harness the power of GPU directly 
into C or FORTRAN requires writing a lot of glue code for 
tasks such as communicating data between GPU and CPU, 
managing GPU memory, initializing and launching CUDA 
kernels and visualizing their output, which makes It difficult 
to evaluate and test CUDA kernels for whole of input data 
space and visualizing their corresponding output. To over¬ 
come this limitation, MATLAB’s parallel processing toolbox 
comes to rescue as its GPU computing libraries facilitate to 
quickly develop, evaluate, visualize and analyze CUDA ker¬ 
nels. 

In this work, we have done improvements and GPU accel¬ 
eration of our previous work on face resolution enhancement 
Mutneja and Singh (2017). This paper is structured as fol¬ 
lows: Section 2 provides Literature Survey. The proposed 
technique based on pixels-homogeneity and relative-ratios 
has been explained in Section 3, Section 4 describes experi¬ 
mental setup, Sections 5 & 6 elaborates Results and Discus¬ 
sion respectively, and finally, Sections 7 & 8 gives Conclu¬ 
sions and Future Scope respectively. 

Literature Survey 

Many researchers have worked in the area of reconstruction 
based face super-resolution such as by Baker and Kanade 
(2002), Jiang et al. (2016a), Yang et al. (2010). A lot of work 
has been done in the learning based face resolution enhance¬ 
ment known as face hallucination such as by Yin et al. (2016), 
Tu et al. (2017). Some of the works have done hybridiza¬ 
tion of reconstruction and learning, such as by Yeganli et al. 
(2016). Many spatial domain based techniques are based on 
neighbor embedding such as by Chang et al. (2004), Huang 
et al. (2016), Jiang et al. (2016b). 


Some researchers have worked in the field of sparse repre¬ 
sentation based face super resolution. Yang et al. (2010) pre¬ 
sented a single image super-resolution by estimating sparse 
representation for each patch of the low-resolution input to 
be applied with the high-resolution image patch dictionary. 
Zhang et al. (2016) proposed a noise robust method based on 
information entropy and regression, to weight facial patches 
for estimating high frequency details from input low-reso¬ 
lution facial images. Huang et al. (2017) Proposed a novel 
sparse coding-based face hallucination method, by incorpo¬ 
rating the intrinsic geometric structure of training samples 
for dictionary learning. Authors worked to minimize artifi¬ 
cial effects using graph construction in HR manifold, and 
K-selection mean constraints for finding optimal weight HR 
face reconstruction. Rahiman and George (2017) worked on 
single image super-resolution using neighbor embedding and 
sparse representation based learning using partitioned fea¬ 
ture space and statistical prediction model. Mao et al. (2016) 
proposed a new weighted-patch super-resolution method us¬ 
ing AdaBoost iteratively to focus more on the patches (face 
regions) with richer information for improving the recon¬ 
struction power. 

Dahl et al. (2017) presented a pixel recursive super resolu¬ 
tion model using convolutional neural networks to synthe¬ 
size realistic details into images while enhancing their reso¬ 
lution, represented a multi-modal conditional distribution by 
accurately modeling the statistical dependencies among the 
high-resolution image pixels, conditioned on a low-resolu¬ 
tion input. 

Chen et al. (2017) proposed noisy facial images super-reso¬ 
lution using the contour features to handle noise and stand¬ 
ard deviation prior as statistical measure to enhance the low 
quality contour feature. Jiang et al. (2017) developed facial 
image super-resolution method based on missing image inter¬ 
polation information using ‘Smooth Regression and a novel 
Local Structure Prior’ (SRLSP). Tu et al. (2017) proposed 
face hallucination method using direct combined approach, 
independent of size and number of training samples . Hui et 
al. (2017) developed appearance, geometrical features, and 
optical flow model single LR based face resolution enhance¬ 
ment technique. Gong and Wang (2017) developed face hal¬ 
lucination based on similarity measurements between single 
LR and corresponding multiple HR training images. 

Proposed Technique 

In this work, GPU acceleration of face resolution enhance¬ 
ment technique has been proposed based on hybridization 
of homogeneity levels and relative-ratios of pixels in LR 
patches, by using window based scanning of whole input 
facial image. Kernel matrices are generated for encoding 
homogeneity levels as well as relative-ratios to construct re¬ 
sultant HR image. LR patches are slid to the nucleus of HR 
patches and remaining pixels in HR patch are computed by 
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application of kernel matrices. The size of LR patch has been 
selected as 3><3 which is swelled to make the HR patch of 
size 5x5. For generating the homogeneity kernel matrix, LR 
window is slid over whole input LR image and checks the 
homogeneity levels of pixels in the surrounding four quad¬ 
rants. To check for the homogeneity levels, the summation 
of pixel intensities is done in the respective quadrant and if 
found less than the set threshold, is marked as homogene¬ 
ous otherwise non-homogenous. Figure 1 shows the division 
of LR patch into four surrounding quadrants. Algorithm 1 
shows the working of GPU CUDA Kernel for generating the 
homogeneity matrix. 

Algorithm 1 GPU CUDA Kernel for Pixels-Homogeneity Matrix Generation _ 

1: Load the Kernel Constants: NR: Number of Rows, NC: Number of Columns of input image/video frame (SI) 

2: Load the Kernel Arguments: Pointer to Input Image (*SI), Pointer to Output Homogeneity Kernel Matrix 
(*SI_H) to hold the status of Homogeneity levels of all pixels in image, of size (NR * NC) 

3: Copy Image to Shared Memory for better efficiency. 

4: Load the pixel co-ordinates from thread Indices: X=blockldx.x, Y=blockldx.y 
5: Compute the linear index of pixel to be operated: linearIcbc=X * NR+Y 

6: Initialize the Difference Matrix (D) to hold the difference of pixels in the scanning window (3* 3) with 
respect to the central pixel 

7: Initialize HCODE=0 to hold the homogeneity levels of pixel, Quadrant wise at Lower Nibble (Q4Q3Q2QI) 

8: Compute the values of Difference Matrix D (3 *3), considering Sl(ri.ci) as central pixel 

9: Compute the sum of values in all four quadrants, stored as HI. H2, H3. H4 from difference matrix (D) 

K) Loop!: For i=l:4 

II: if Hi <= Th limit then Set the corresponding bit in HCODE: HCODE = bitor(HCODE; i) 

12: Iterate Loopl for all Pixels in Quadrant 

13: Assign HCODE to respective location in output Kernel Matrix SI H, SI H[UnearIdx] = HCODE 

16: Return 


To generate the kernel matrix for the relative-ratios, the ratio 
of intensity levels of all the pixel is computed with respect 
to the central pixel in LR patch. The process of iterating the 
pixels is same as depicted in the Algorithm 1. After genera¬ 
tion of kernel matrices, they are applied to the input image 
to create the corresponding HR image. Algorithm 2 shows 
the flow chart of proposed technique to generate the HR im¬ 
age by applying the homogeneity and relative-ratios kernel 
matrices. The GPU kernel thread block size has been set as 
lxlxl while the grid size has been set as NRxNCx 1. 


Q1.j Q2 

. i p(u) •. ■ 

Q4 I Q3 


Figure 1 : Division of Pixels in Four Surrounding Quadrants for 
Computing Homogeneity Matrix. 


Experimental Setup 

The machine running Windows 7 (64 Bit) on Intel Core i3 
1.9 GHz and having NVIDIA graphics processing unit Ge¬ 
Force GT 740M has been used to develop and test the pro¬ 
posed method. 


RESULTS 

Figure 2 shows the results of application of proposed algo¬ 
rithm (shown in column 4) on some of the test LR images 
(shown in first column) along with the existing techniques 
(‘Bicubic’ and ‘Box’). 


Table i: Results of Testing on Grayscale Images 


Quality Metric 

Output HR Image 

Sizc(450x40S) 

$F=L67 

Output HR Image 
£ize<450*4l>5) 
Sf=2,78 

SSTM 

0.923 

0.869 

PS NR 

35.679 

29.564 

SNR 

28J96 

22.772 

MSB 

19.852 

70.172 

CPU Time 

0.38 s 

0,62 s 

GPU Time 

0 J 2 S 

0.31 s 


Table i and Table 2 shows the results of application of proposed 
algorithm on the Gray-scale and Color test images respectively, 
for different values of scaling factors. 



Figure 2: Comparison of the Proposed Technique on Some 
Test Images. 
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Table 2: Results of Testing on Color (RGB) Images 


Quality Metric 

Output HU limige 
Sizc(4SO*405) 
SF=1,67 

Output HR Image 
Sizc(450*405) 
SF=2/78 

SSIM 

0.972 

0,931 

PSNR 

35.441 

29.845 

SNR 

28.489 

23.12 

MSE 

22,822 

76.618 

CPU Time 

0.74 s 

0.93 s 

GPU Time 

0.53 s 

0 67 s 


Further the gain in the time efficiency achieved by apply¬ 
ing the parallelization technique through GPU computing as 
compared to its serial version is depicted in Figure 3. 



Resolution Enhancement Factor 


Figure 3: Comparison of Time Efficiency CPU (Serial Version) 
Versus GPU (Parallel Version) 

DISCUSSION 

The quality metrics Structural Similarity Index Measure 
(SSIM), Peak Signal To Noise Ratio (PSNR), Signal To 
Noise Ratio (SNR) and Mean Square Error (MSE) have been 
measured from the generated HR image with respect to the 
original HR image, along with the measurement of GPU ac¬ 
celeration achieved with respect to the CPU (Central Pro¬ 
cessing Unit) code (refer Table 1 & 2). As the work has been 
done in MATLAB, the comparison of results of proposed 
algorithm has been done with some methods (‘Bicubic’ and 
‘Box’) available in MATLAB (refer Figure 3). 

The proposed method has been tested on color as well as 
gray-scale images. Two folders have been created for testing, 
each containing 20 test images. One of the folders contains 
gray-scale images and second constitutes color facial images 
from database color FERET (Phillips et al. (2000, 1998)). 
Portions of the research in this paper use the FERET data¬ 
base of facial images collected under the FERET program, 
sponsored by the DOD Counterdrug Technology Develop¬ 
ment Program Office FERET Phillips et al. (1998), Phillips 


et al. (2000). Table 1 and Table 2 shows the mean values of 
measured quality metrics and time taken for gray-scale and 
color images respectively. The term SF refers to the scal¬ 
ing factor i.e. face resolution enhancement factor. By com¬ 
parison, it is inferred that application of proposed technique 
using color based processing (i.e. of channels R,G and B) 
gives better results as compared to gray-scale based only 
processing on color images. 

Further, the efficiency of the proposed algorithm has been 
tested on different resolution factors. For this, the test images 
from FERET are down-scaled to the factors 0.6000, 0.3600, 
0.2160, 0.1296 and 0.0778. The percentage reduction in the 
time cost has been calculated in each case. Figure 3 shows 
the comparison of time cost of proposed algorithm using 
GPU Computing with respect to the CPU code. The maxi¬ 
mum percentage time cost reduction of the order of 28.38% 
has been achieved corresponding to the scaling factor 1.67 
and 22.39% at scaling factor of 12.86. 

CONCLUSION 

We have successfully developed and tested the GPU acceler¬ 
ated face resolution enhancement algorithm based on pixels- 
homogeneity and relative-ratios in MATLAB environment 
using CUDA based parallel computing. The testing of the 
proposed algorithm has been done on test facial images from 
web and FERET database (Phillips et al. (2000, 1998)) and 
comparison done with some of the existing techniques in 
terms of quality metrics SSIM , PSNR , SNR and MSE. It has 
been observed that considerable speedup was achieved by 
the parallel execution of proposed face resolution enhance¬ 
ment algorithm as compared to its serial version. 

As MATLAB interprets the code, and it is the interpreter that 
slows down the processing, therefore by using GPU comput¬ 
ing, we have tried to compensate for this time efficiency loss 
and at the same time benefiting user friendly environment 
of MATLAB for efficient development of GPU computing 
based algorithm. From the results of quality metrics and time 
efficiency improvement achieved by parallelization, we con¬ 
clude that proposed system is very effective and efficient in 
resolution enhancement of facial images. 

Future Scope 

In the near future, we intend to make improvements in the 
proposed technique by making use of shared memory in 
GPU, increasing the extent of parallelization for further time 
acceleration, introducing adaptability in terms of selection 
of size of LR and HR patches for achieving the desired reso¬ 
lution enhancement factors. Further, we intend to hybridize 
the proposed technique with boosting based analysis for im¬ 
provement in the quality of resultant HR facial image. 
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Algorithm 2 Resolution Enhancement Using Pixels-Hamogenoity Relative-Ratios Kernels _ 

1: Load the Input Image/Frame from Video, SI 

2: Get the Size of the Input LR (Low Resolution) Image, 5A7?=Number of Rows, SNC: Number of Columns 
3: Get the Gray-Scale version or Respective channel (R,G or B) of Input Image (SIC) 

4: Initialize GPU CUDA Kernels: Kernel I for Homogeneity Matrix and Kernel2 for Relative-Ratios Matrix 
5: Initialize GPU CUDA Kernel Constants: NR (Number of Rows), NC (Number of Columns) and Kernel 
dimensions 

6: Initialize GPU CUDA Kernel Arguments: SI (Input Image), SI_H (Homogeneity Kernel Matrix) and SIRR 
(Relative Ratios Kernel Matrix) 

7: Get the Homogeneity Kernel Matrix by invoking CUDA Kernel 1, Kernel Arguments: (SI, Si ll) 

8: Get the Relative Ratios Kernel Matrix by invoking CUDA Kerncl2, Kernel Arguments: (SI, SI RR) 

9: Initialize the matrix TIC to hold the target 1IR image of size. Number of rows TNR = (5/3)* SNR, Number of 
columns TNC = (5/3)*SNC, Class: Unsigned int 8 bit. 

10: Initialize the Temporary Matrix T(5 x 5) as envelope to hold 3><3 patch from LR image, and Difference 
Matrix D(3,3) to hold the difference of neighborhood pixels w.r.t. central pixel 
II: Loopl: For ri = 2:3: SNR-I (Nested Loops to Scan the pixels in LR Image) 

12: Loop2: For ci = 2:3: SNC - I 

13: Insert the (3^3) patch from LR image to center of T i.e. to rows and columns 2 to 4 of T 

14: Compute the values of Difference Matrix D (3, 3), considering Sl(ri.ci) as central pixel 

15: Find mean of pixel values in all four quadrants as AI, A2, A3 and A4, in 3x3 LR patch 

16: Get the homogeneity code for pixel location ri, ci. Check the status of homogeneity of 

All four quadrants in LR patch 

17: Loop3: For i = 1: 4 To Scan for Pixels in a Quadrant 

18: if Hi = I then 

19: Homogeneous Quadrant, Assign to the unassigned pixels of envelope T the 

corresponding mean quadrant value (Af), i = I to 4 

20: else 

21: Non Homogeneous Quadrant, Translate the di_erence matrix D of LR patch to 

respective comer (3x 3 patch) in envelope T (HR Patch of size 5 x5) 

22: Iterate Loop3, For All Pixel Locations in Quadrant 

23: Multiply the generated (5 x5) patch with respective patch from Relative-Ratios kernel to 

generated resultant HR patch 

24: Compute the target locations for inserting the computed 5x5 HR patch in the resultant 

HR image ( TIC) 

25: Iterate Loop2 for next pixel location 

26: Iterate Loopl for next pixel location 
27: HR Image TIC Generated 
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